In this report, the electric consumption data of Turkey will be analyzed and used to forecast with an appropriate ARMA model for two weeks.
Necessary Libraries
library(data.table)
library(lubridate)
library(tidyverse)
library(zoo)
library(urca)
library(forecast)
Data preparation and necessary manipulations
data_path='/Users/tolgaerdogan/Documents/GitHub/spring21-TolgaErdogann/Data/RealTimeConsumption-01012016-20052021.csv'
consumption=fread(data_path)
consumption=consumption[!is.na(Consumption)]
consumption[,Date:=as.Date(Date)]
head(consumption)
## Date Hour Consumption
## 1: 2016-01-01 0 26277.24
## 2: 2016-01-01 1 24991.82
## 3: 2016-01-01 2 23532.61
## 4: 2016-01-01 3 22464.78
## 5: 2016-01-01 4 22002.91
## 6: 2016-01-01 5 21957.08
consumption=consumption %>%
mutate(Date = make_datetime(year=year(Date), month=month(Date), day=mday(Date), hour=Hour))
consumption$Consumption<-as.numeric(consumption$Consumption)
str(consumption)
## Classes 'data.table' and 'data.frame': 47208 obs. of 3 variables:
## $ Date : POSIXct, format: "2016-01-01 00:00:00" "2016-01-01 01:00:00" ...
## $ Hour : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Consumption: num 26277 24992 23533 22465 22003 ...
## - attr(*, ".internal.selfref")=<externalptr>
Electric consumption should be affected by hourly, weekly, and also monthly basis. In this part seasonality of the data will be investigated in these time regions.
At first, data will be investigated on an hourly basis to check whether there is a seasonality in day hours. To check the seasonality, I converted the hourly data to a time series object with frequency 24.
hourly = consumption[, Consumption]
ts_hourly<-ts(hourly, freq=24)
ts.plot(ts_hourly,main="Hourly usage")
acf(ts_hourly)
From the autocorrelation diagram, we can see the correlation in the 1st and 25th lines, which can conclude that seasonality occurs on an hourly basis. To further investigate I will decompose the series with additive decomposition to check the seasonal component. I preferred additive decomposition since the magnitude of the seasonal component does not depend on the magnitude of the data, in other words as we can see from the graph of hourly consumption the magnitude of seasonality does not change as the series go up or down.
Hourly_additive<-decompose(ts_hourly, type="additive")
plot(Hourly_additive)
plot(Hourly_additive$seasonal[1:100], type='l')
When we check the seasonal component, we can clearly see the pattern in 24 hours which makes sense since we expect the consumption to decrease in night hours.
To check the weekly seasonality data is converted to a daily basis by taking the mean values of day hours.
daily = consumption[, list(Consumption=mean(Consumption, na.rm=TRUE)), by=list(as.Date(Date))]
ts_daily=ts(daily$Consumption,freq=7)
ts.plot(ts_daily,main="Daily Usage")
acf(ts_daily)
On weekly basis, from the autocorrelation diagram, we can see the correlation between weekdays. To further investigate I will check the seasonal component with additive decomposition
Daily_additive<-decompose(ts_daily, type = "additive")
plot(Daily_additive)
plot(Daily_additive$seasonal[1:30], type='l')
When we check the seasonal component, we can clearly see the pattern in 7 days which makes sense since we expect the consumption to decrease on weekends.
To check the weekly seasonality data is converted to a monthly basis with necessary manipulations.
monthly = consumption[, list(Consumption=mean(Consumption, na.rm=TRUE)), by=list(month(Date), year(Date))]
ts_monthly=ts(monthly$Consumption,freq=12)
ts.plot(ts_monthly,main="Monthly Usage")
acf(ts_monthly)
Since we eliminated the weekly and monthly correlation, autocorrelation decreased significantly
Monthly_additive<-decompose(ts_monthly, type = "additive")
plot(Monthly_additive)
plot(Monthly_additive$seasonal[1:70], type='l')
From the seasonality analysis investigated on different time regions, we encountered a clear seasonality on daily and weekly basis. There was also a seasonality on yearly basis but it was not significant as the others therefore, to imply daily and monthly seasonality I will use a frequency of 7*24=128
Constructing a time series with frequency 168, and decomposing the series with additive decomposition
model_ts<-ts(hourly,start(2016,1),frequency = 168)
model_ts_additive<-decompose(model_ts, type = "additive")
plot(model_ts_additive)
plot(model_ts_additive$seasonal[1:500], type='l')
summary(ur.kpss(model_ts_additive$random))
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 18 lags.
##
## Value of test-statistic is: 0.0042
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
When we check the stationarity of random component, we see that the value of test-statistic is lower than the critical values, showing that the random component is stationary. We can use this data for our model.
Subtracting the seasonal component
model_deseasonalized<-model_ts-model_ts_additive$seasonal
plot(model_deseasonalized[0:10000], type = 'l')
acf(model_deseasonalized)
When we check the autocorrelation, we still encounter seasonality since we haven’t removed the yearly seasonality
Subtracting the trend
model_detrend<-model_deseasonalized-model_ts_additive$trend
plot(model_detrend[0:10000], type = 'l')
acf(model_detrend,na.action = na.omit)
Subtracting also the trend, ACF values lowered significantly however we are still out of the confidence interval. In this context, we should consider lags however since we are trying to construct an ARMA model I will continue with this model.
##ARMA Model
Checking the p values to find the lowest AIC value
Model_1<-arima(model_ts_additive$random, order=c(1,0,0))
Model_2<-arima(model_ts_additive$random, order=c(2,0,0))
Model_3<-arima(model_ts_additive$random, order=c(3,0,0))
Model_4<-arima(model_ts_additive$random, order=c(4,0,0))
Model_5<-arima(model_ts_additive$random, order=c(5,0,0))
Model_6<-arima(model_ts_additive$random, order=c(6,0,0))
AIC(Model_1)
## [1] 733654.4
AIC(Model_2)
## [1] 722906
AIC(Model_3)
## [1] 722896.7
AIC(Model_4)
## [1] 722855.1
AIC(Model_5)
## [1] 722855.1
AIC(Model_6)
## [1] 722806.9
AIC values are the same for Model_4 and Model_5, I will choose the lower one to ensure simplicity
Model_1<-arima(model_ts_additive$random, order=c(0,0,1))
Model_2<-arima(model_ts_additive$random, order=c(0,0,2))
Model_3<-arima(model_ts_additive$random, order=c(0,0,3))
Model_4<-arima(model_ts_additive$random, order=c(0,0,4))
Model_5<-arima(model_ts_additive$random, order=c(0,0,5))
Model_6<-arima(model_ts_additive$random, order=c(0,0,6))
AIC(Model_1)
## [1] 771819.8
AIC(Model_2)
## [1] 747445.1
AIC(Model_3)
## [1] 734563
AIC(Model_4)
## [1] 728972.5
AIC(Model_5)
## [1] 725530.1
AIC(Model_6)
## [1] 724500.2
AIC values will keep decreasing since it is time consuming and to ensure simplicity, I will stop here and go with q=5
I will check the adjacent to see if we can find a better AIC value, and also lowering the values of parameters will decrease the complexity of the model.
Model1<-arima(model_ts_additive$random, order=c(4,0,5))
Model2<-arima(model_ts_additive$random, order=c(3,0,4))
AIC(Model1)
## [1] 722820.8
AIC(Model2)
## [1] 722805.9
Model2 resulted better, therefore I will continue with Model2
Fitted<-model_ts_additive$random-Model2$residuals
modeltransformed<-Fitted+model_ts_additive$seasonal+model_ts_additive$trend
After fitting the model let’s check the fitted model with the actual data
plot(model_ts, type ='l')
points(modeltransformed,type='l', col = 2, lty = 2)
plot(model_ts[40000:40500], type ='l')
points(modeltransformed[40000:40500],type='l', col = 2, lty = 2)
Since we are trying to forecast 14 days in an hourly basis, n.ahead is set to 14*24=336 and since 2016-1-1 is friday, I started the seasonal component from 145
model_forecast<-predict(Model2, n.ahead = 336)$pred
model_forecast=ts(model_forecast)
last_trend <-tail(model_ts_additive$trend[!is.na(model_ts_additive$trend)],1)
seasonality=model_ts_additive$seasonal[145:480]
model_forecast=model_forecast+seasonality+last_trend
plot(model_forecast)
data_path='/Users/tolgaerdogan/Documents/GitHub/spring21-TolgaErdogann/Data/RealTimeConsumption-06052021-20052021.csv'
consumptionactual=fread(data_path)
consumptionactual[,Date:=as.Date(Date)]
actual_ts=ts(consumptionactual$Consumption)
plot(actual_ts)
points(model_forecast,type='l', col = 2, lty = 2)
Our model has done a good job in terms of seasonality however since the trend is assumed constant, it couldn’t catch some days.
To calculate daily bias data is transformed on a daily basis
consumptionactual=consumptionactual[, ConsumptionForecast:=as.numeric(model_forecast)]
consumptionactual=consumptionactual[, Error:=Consumption-ConsumptionForecast]
consumptionactual=consumptionactual[, AbsError:=abs(Error)]
consumptionactual=consumptionactual[, Bias:=Error/Consumption]
consumptionactual=consumptionactual[, Ape:=AbsError/Consumption]
Errors = consumptionactual[, list(ConsumptionActual=mean(Consumption, na.rm=TRUE), ConsumptionForecast=mean(ConsumptionForecast, na.rm=TRUE), dailybias=mean(Bias, na.rm=TRUE),dailymape=mean(Ape, na.rm=TRUE)), by=list(as.Date(Date))]
Errors
## as.Date ConsumptionActual ConsumptionForecast dailybias dailymape
## 1: 2021-05-06 35853.67 32916.49 0.08309171 0.08309171
## 2: 2021-05-07 35248.29 32628.16 0.07530761 0.07530761
## 3: 2021-05-08 34460.27 31015.15 0.10026353 0.10026353
## 4: 2021-05-09 31458.78 27751.19 0.11802841 0.11802841
## 5: 2021-05-10 33445.54 31800.19 0.05182941 0.05643824
## 6: 2021-05-11 33621.64 32627.22 0.03072259 0.04431398
## 7: 2021-05-12 29342.16 32825.85 -0.12036757 0.15064868
## 8: 2021-05-13 25095.06 32916.49 -0.31269227 0.31269227
## 9: 2021-05-14 26060.54 32628.16 -0.25480196 0.25480196
## 10: 2021-05-15 27191.20 31015.15 -0.14369845 0.14369845
## 11: 2021-05-16 28234.82 27751.19 0.01353900 0.04614975
## 12: 2021-05-17 33801.75 31800.19 0.05721474 0.05721474
## 13: 2021-05-18 35629.71 32627.22 0.08372646 0.08372646
## 14: 2021-05-19 34572.84 32825.85 0.05131077 0.05131077
Daily bias and daily MAPE are seemed to be larger on the the days that model couldn’t catch actual consumptions, however in general Bias and MAPE values are small enough.
In this report I tried to construct an ARMA model to forecast the electricity consumption of Turkey for 2 weeks in an hourly basis. Frequency of 168 in hour terms is chosen for the seasonality and I tried to fit an ARMA model on that frequency. Comparing different p and q values ARMA(3,0,4) resulted best and used for the model.